Style Guide

Purpose of this document

This document sets a baseline to ensure code quality during a project. It provides guidelines on Python coding best practices, but it can be adapted to any other programming language. You will also find insights on responsibilities expected from a developer / data scientist before opening a Pull Request, and from a reviewer to ensure a good and constructive review.

Code reviews

Code reviews are an important part of open source projects. They help ensure the quality of our work and share it with others while proving code quality and sharing understanding.

Cheat sheets

No matter what your previous experience in a topic is, anyone can have valuable insights!

Here are below two cheat sheets to follow as a PR author and as PR reviewer.

Author cheat sheet

Usually, a PR corresponds to a single task, i.e., a unit of added value to the codebase, e.g., a feature, a bugfix, documentation, etc. However, if the PR seems too large or complex, it might be an indicator that a task should be split into multiple pieces and have a PR for each of them.

When a review is conducted, be open to feedback.

You can find more detailed guidance as an author on the ISE Code With Engineering Playbook.

Reviewer cheat sheet

Understand the code you are reviewing and take your time
Be inclusive, and foster a positive code review culture
Ensure code quality
Ask for a walkthrough from the author if necessary (do not use walkthrough to hide a PR's complexity, but rather as a way to quickly onboard on it)

You can find more detailed guidance as a reviewer on the ISE Code With Engineering Playbook.

Tooling used

Code Quality can be enforced through tools (linters, formatters, type checkers, etc.) that automatically ensure consistency of the code. This topic should be discussed at the beginning of any project to decide which tools are best for your specific project.

Currently, the following tools are recommended:

Black: code formatter
Flake8: linter

Note: We have by default an pipeline that will run flake8 and markdownlint on your PR. You can also run these tools locally to ensure your code is compliant.

Coding guidelines

This coding agreement provides some guidelines for Python, but they can be adapted to any other programming language.

The guidelines in this section need to be followed by developers / data scientists. Reviewers will also need to validate these guidelines during PR review. The comments in brackets next to the section title refer to production-ready code. Experimental or exploratory code does not need to strictly follow these guidelines. However, it is recommended that most of the rules are followed, because then it will be much easier to convert experimental code into production-ready code when needed.

Framework project structure [MUST]

The project structure should be organized in a way that is easy to understand and navigate. The structure should be consistent across all projects. Here is an example of a project structure:

investing-algorithm-framework/
│
├── investing_algorithm_framework/
│   ├── my_module/
│   │   ├── __init__.py
│   │   └── my_module.py
│   │
│   ├── my_other_module/
│   │   ├── __init__.py
│   │   └── my_other_module.py
│   │
│   └── __init__.py                # Entry point for the application
│
├── tests/
│   ├── test_my_module.py
│   └── test_my_other_module.py
│
├── docs/                      # Documentation (Sphinx, Markdown, etc.)
├── docusaurus/                # Docusaurus documentation
├── .gitignore                 # Ignore unnecessary files
├── README.md                  # Project documentation
├── pyproject.toml           # Build system and tool configurations
└── LICENSE                    # License file

Python Code Layout [MUST]

import: first standard libraries, then third party and finally local libraries. All groups alphabetically sorted.
blank lines: two blank lines surrounding classes and top-level functions. Methods inside functions are surrounded by a single line.
indentation: use 4 spaces (most IDEs will convert tab into 4 spaces by default).
line length: maximum 88 chars.

# Standard libs
import std_lib_1
import std_lib_2

# Third party libs
import third_party_lib_1
import third_party_lib_2

# Local lib
from local_lib import class_1, function_1


# After two blank lines
def top_level_function(args: int) -> str:
    # Body

Python Naming convention [MUST]

File names: lowercase, words separated by an underscore, e.g., my_file.py.
language, spelling:
- Class names, function names, and variable names are written in English. Use meaningful and grammatically correct names.
- Use verbs to name functions and methods (they are actions), and names to name variables and classes (they are things).
class name: name should start with an uppercase and follow the camlCase convention if it has more than two words.
function name:
- lowercase, words separated by an underscore.
- add self argument at first position if the method is a class's method.
- if the function's name clashes with a reserved word, append underscore.
- use 2 underscores at the beginning for private class's methods.
- use 1 underscore at the beginning of a private field.
- specify the type of your input parameters
- always provide a return type (use 'None' if 'void').
variable name: lowercase, words separated by an underscore.
constant name: uppercase, words separated by an underscore.

my_variable = 10

GLOBAL_CONSTANT = 10

class CatalogInformation:
    def __init__(self, name: str) -> None:
        # Body constructor

    def get_metadata_count(self) -> int:
        # Body method
        return 1

    def __check_internal_property(self) -> bool:
        # Body private method
        return True

Python Comments / documentation [MUST]

Use comments, provide explanation on complex algorithms. Unless a function is trivial, always add a high-level comment.

Use docstrings to document your code. Docstrings are used to document the purpose of a function, the parameters it takes, and the return value. They are used to generate documentation automatically. Google-style docstrings are recommended.

def is_valid(a: int, b: str) -> bool:
    """
    Explanation of the function

    Args:
        a (int): The first number.
        b (int): The second number.

    Returns:
        Type: The return type.
    """

Python Tuples, Lists, Dictionaries [MUST]

Use tuples when data is non-changeable, dictionaries when you need to map things, and lists if your data can change later.

Functions can return multiple values, no need for a list:

def get_info(self) -> str, int:
    """
    Return a string and an integer

    Args:
        name (str): a string
        name (int): an integer
    
    Returns
        Tuple[str, int]: a tuple containing a string and an integer
    """
    return "hello world", 30

Python Use context managers [MUST]

Context managers are tool to use in situations where you need to run some code that has preconditions and postconditions.

For instance, when you read the content of a file, you need to ensure that you close the handle regardless of the success or the failure of the operation. With the with keyword you can achieve this:

with open(filename) as fd:
    process_file(fd)

# Note that parentheses are supported in Python 3.10 for context manager,
# useful when you have many 'with'
with (
    CtxManager1() as example1,
    CtxManager2() as example2,
    CtxManager3() as example3,
):
    ...

You can implement your own context manager should you need to execute actions in a certain order. Consider the case where you want to update a service configuration. You need first to stop the service, update the configuration then start the service again:

class ServiceHandler:
    def __enter__(self) -> ServiceHandler:
        run("systemctl stop my.service")
        return self

    def __exit__(self, exc_type: str, ex_value: str, ex_traceback: str) -> None:
        run("systemctl start my.service")

def update_service_conf() -> None:
    # Body to update service's configuration

if __name__ == '__main__':
    with ServiceHandler():
        update_service_conf()

Python Exceptions handling [MUST]

Hiding an exception or not properly anticipating potential errors (accessing an API for instance, network issues can arise) can lead to unexpected behaviors or terminating the execution. Another example is that the caller should be notified if a function receives wrong input parameters to avoid this way wrong results that might be difficult to debug.

Each function has a logic, this logic must be followed by exceptions raised. For instance, if you have a function that get some data from an API, exceptions raised by this function should be logical: connection error, timeout etc. The exception must be raised at the right level of abstraction.

If you choose to propagate the exception to the caller, ensure that you do not expose sensitive information. Tracebacks of exceptions can contain sensitive details leading to exposing intellectual property.

Do not use exceptions as a go-to logic, meaning catching an exception and from the except calling other business code - the flow of the program will become harder to read. Exceptions are usually to notify the caller that something unexpected occured.

Finally, observability is an important engineering fundamental. Properly handling exceptions and managing observability (with AppInsight for instance) will lead to a more robust application and easier debugging when something unexpected occurs.

# Never do ...
try:
    process_data()
except:
    pass

# Encapsulate orginal exception trace
def process_data() -> None:
    try:
        do_something()
    except KeyError as e:
        # Raise a specific exception from do_something,
        # encapsulate trace to a custom exception
        raise MyApplicationException("Item not present") from e

Unit tests [MUST]

Unit testing is a core tool in software engineering. They help us verify the correctness of our code, encourage good design practices, and reduce chances to have bugs hitting production. Unit tests can improve development efficiency.

Unit tests should be:

Reliable: should be 100% reliable so failures indicate a bug in the code.
Fast: should run in milliseconds.
Isolated: removing all external dependencies ensures reliability and speed.

Python unit tests

For Python, we recommend using unittest. unittest is a powerful tool that makes it easy to write simple tests, but also scales to support complex functional testing for applications and libraries.

from unittest import TestCase


class TestMyClass(TestCase):

    def setup(self):
        # Setup

    @classmethod
    def setup_class(cls):
        # Setup class
        pass

    def test_my_function(self):
        # Test the function
        self.assertEqual(my_function(1), 2)

Please refer to unittest site for more useful patterns.

Style Guide

Purpose of this document​

Code reviews​

Cheat sheets​

Author cheat sheet​

Reviewer cheat sheet​

Tooling used​

Coding guidelines​

Framework project structure [MUST]​

Python Code Layout [MUST]​

Python Naming convention [MUST]​

Python Comments / documentation [MUST]​

Python Tuples, Lists, Dictionaries [MUST]​

Python Use context managers [MUST]​

Python Exceptions handling [MUST]​

Unit tests [MUST]​

Python unit tests​